3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

نویسندگان

چکیده

In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality features, ResNeXt and text features generated by DenseCap. We propose the 3M model, Multi-UPDOWN caption that encodes decodes them into captions. demonstrate effectiveness of our on generating human-like captions examining its performance two datasets, PERSONALITY-CAPTIONS dataset, FlickrStyle10K dataset. compare against variety state-of-the-art baselines various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc \footnote{code will be available at https://github.com/cici-ai-club/3M}. A qualitative study has also been done to verify can used different stylized

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Clustering Using Multi-visual Features

This paper presents a research on clustering an image collection using multi-visual features. The proposed method extracted a set of visual features from each image and performed multi-dimensional K-Means clustering on the whole collection. Furthermore, this work experiments on different number of visual features combination for clustering. 2, 3, 5 and 7 pair of visual features chosen from a to...

متن کامل

Multi-image Matching Using Segment Features

This paper presents a strategy for matching features in multiple images, which emphasises reliable matching and the recovery of feature extraction errors. The process starts from initial ‘good’ matches, which are validated in multiple images using multi-image constraints. These initial matches are then filtered through a relaxation procedure and are subsequently used to locally predict addition...

متن کامل

Image Multi-Classification using PHOW Features

Automatic labeling and classification of a vast number of images is a huge challenge, so machines are used as a part of image classification and annotation is turned into a prerequisite to adapt to the high improvement of advanced digital image innovations consistently. Scale Invariant Feature Transform (SIFT) is an image descriptor for image-based matching and recognition; this descriptor is u...

متن کامل

A Mathematical Model for Multi-Region, Multi-Source, Multi-Period Generation Expansion Planning in Renewable Energy for Country-Wide Generation-Transmission Planning

Environmental pollution and rapid depletion are among the chief concerns about fossil fuels such as oil, gas, and coal. Renewable energy sources do not suffer from such limitations and are considered the best choice to replace fossil fuels. The present study develops a mathematical model for optimal allocation of regional renewable energy to meet a country-wide demand and its other essential as...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... International Florida Artificial Intelligence Research Society Conference

سال: 2021

ISSN: ['2334-0762', '2334-0754']

DOI: https://doi.org/10.32473/flairs.v34i1.128380